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CHAPTER  1.  INTRODUCTION  AND  SUMMARY 


The  reproductive  quality  of  voice  communication  on  a packet 
J switched  network  is  enhanced  by  reducing  the  total  delay  between 

generation  of  the  original  signal  and  its  reproduction  at  the 
I destination,  and  by  maintaining  as  much  as  possible,  the  continuity 

or  smoothness  of  the  reproduction.  In  general,  these  two  goals 
cannot  be  pursued  simultaneously:  smoothness  can  be  ensured  by 
accepting  a sufficiently  long  delay,  while  minimizing  delay  can  be 
accomplished  only  at  the  expense  of  losing  part  of  the  transmitted 
message  or  "sliding  time",  which  affects  smoothness  adversely. 

The  delay  factor  depends  upon  packet  size,  network  transit 
time  and  initial  wait  between  receipt  and  playback  of  the  first 
packet  of  a burst  of  communication.  Smoothness  depends  upon  the 
same  factors.  Since  the  sender  and  receiver  have  no  control  over 
network  transit  time,  they  must  pursue  an  optimal  strategy  of 
choice  of  packet  size  and  wait  factor  to  maximize  reproductive 
quality . 

During  this  quarter,  we  analyzed  trace  recordings  from  previous 
conferences  to  determine  the  nature  of  delays  and  the  range  of 
variations  in  network  transit  time.  We  conjectured  that  a system 
of  automatic  adjustment  of  the  wait  factor  to  comply  with  current 
network  conditions  would  be  a good  method  for  reducing  delay  to  the 
minimum  consistent  with  smooth  reproduction.  Such  a method  was 
implemented  in  our  LPC  conference  programs  and  tested  in  conferences 
I with  ISI. 

I Details  of  the  algorithm  are  discussed  in  Section  3 which  also 

contains  graphs  of  delays  for  a conference  in  which  automatic  delay 
I adjustment  was  employed,  as  well  as  for  a conference  in  which  a 

i single  delay  factor  was  used  throughout  the  conference.  The  effect 

{ 

i of  packet  size  on  reproductive  quality  is  discussed  in  Section  2. 
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CHAPTER  2.  EFFECT  OF  PACKET  SIZE  ON  REPRODUCTIVE  QUALITY 

Let  us  confine  our  discussion  to  a single  burst  of  speech  con- 
taining no  periods  of  silence.  Assume  that  time  Is  measured  In  units 
of  frames,  that  a parcel  contains  the  parameters  for  one  frame  and 
that  a packet  ( or  message)  contains  n parcels.  If  the  data  from  the 
first  parcel  was  generated  at  time  t^,  the  packet  cannot  be  sent 
until  time  t^+n.  There  Is  thus  a delay  of  at  least 

[n  + network  transit  time] 

between  generation  of  data  and  Its  playback.  At  each  node  of  the 
transmission,  the  entire  packet  must  be  received  before  It  can  be 
resent.  Thus  the  network  transit  time  Itselt  Increases  with  n. 

If  the  message  length  (In  terms  of  the  number  of  parcels)  varies, 
the  wait  W (between  receipt  and  playback  of  the  first  packet  of  the 
burst)  should  be  at  least  as  long  as  the  maximum  number  of  parcels 
In  any  message  of  the  burst,  minus  the  number  of  parcels  In  the  first 
packet . 

W > max.  PC  - PC  of  first  message  (1) 

Otherwise,  assuming  the  network  transit  time  was  approximately  con- 
stant for  packets  In  the  burst,  the  time  for  playback  of  the  longest 
message  would  arrive  before  that  message  itself  was  received.  This 


might  also  happen  for  other  messages  of  the  burst.  See  Figure  1 for 
a schematic  drawing  of  this  situation. 


[2)  E(2) 

|l  2 3| 


Message  3 


A(3) 

I 1 2 3 4 


5 6 7 


B(3)  C(3)=D(3) 

8 9 I NTT  I 


Message  3 due  to  be  played 
back  here,  but  message  has 
not  yet  arrived. 


Figure  1 

A(i)  - represents  the  time  the  first  parcel  of  message  i was  generated, 
B(l)  - time  the  message  was  sent. 

C(i)  - time  the  message  was  received. 

D(i)  - beginning  of  playback  for  message  i. 

E(l)  - end  of  playback  for  message  i. 


NOTE; 


1. 

For  each  message,  the 

data  is  represented  by 

a 

count  of 

parcels  (e.g.,  message  1 has 

four  parcels) . 

2. 

C(l)-B(i)  is  network 

transit 

time,  assumed  here 

to 

be  equal 

to  3 . 

3. 

A(i+1)  = B(i) 

4. 

D(i+1)  = E(i)  for 

smooth 

playback . 

5. 

The  wait,  W,  in  this 

example 

is  4 . 

6. 

For  message  3,  length 

* 9,  length  of  message 

1 

= 4 

and  W = 

Relation  (1)  does  not 

hold. 

Thus  message  3 

is 

due 

to  be 

played  back  before  it 

is  received. 

In  setting  packet  size,  there  are  a number  of  considerations. 

If  packet  size  varies,  the  longest  message  must  be  allowed  for  in 
setting  W,  otherwise  smoothness  will  suffer;  yet  using  the  longest 
message  Increases  the  overall  delay.  On  the  other  hand,  sending 
packets  of  constant  parcel  length  facilitates  smoothness  of  repro- 
duction. If  that  constant  parcel  count  is  small,  overall  delay  is 
reduced.  However,  sending  very  short  packets  would  be  an  inefficient 
use  of  the  network.  If  all  transmissions  on  the  network  were  in 
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minimal  packets,  the  network  transit  time  might  degrade. 

Analysis  of  trace  recordings  disclosed  two  algorithms  were  used 
for  packing  messages  for  transmission. 

1.  Send  a packet  as  soon  as  a preset  minimum  number  of 
parcels  m has  been  generated,  as  long  as  a maximal  bit  length  has  not 
been  exceeded.  Increase  message  size  to  maximal  bit  length  whenever 
a backlog  of  messages  to  be  sent  builds  up. 

2.  Send  a packet  as  soon  as  a preset  minimum  number  of 
bits  has  been  generated,  so  long  as  a maximum  parcel  count  M has  not 
been  exceeded.  Increase  message  size  to  maximal  bit  length  when- 
ever a back  log  of  messages  to  be  sent  builds  up. 

' Method  1 results  in  messages  which  have  relatively  constant  parcel 

I ! count  but  may  differ  radically  in  bit  count.  Method  2 leads  to 

I messages  with  stable  bit  count  but  whose  parcel  count  varies  widely. 

[ If  m < M,  a shorter  delay  W would  be  needed  for  Method  1 than  for 

[ Method  2.  Thus,  from  the  point  of  view  of  smoothness  of  reproduction 

and  minimal  delay,  the  first  method  is  superior. 
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CHAPTER  3.  AUTOMATIC  DELAY  ADJUSTMENT 


Let  us  confine  our  attention  to  an  interval  of  speech  preceded 
and  followed  by  silence  and  comprising  a minimum  of  M messages.  We 
use  the  following  abbreviations  for  factors  associated  with  a packet. 

TG  - time  at  which  the  first  frame  of  the  message  was  generated 

TS  - time  the  message  was  sent 

TR  - time  the  message  was  received 

TD  - time  the  message  was  due  to  be  played  back 

TP  - time  at  which  the  message  was  played  back 

OT  - observed  network  transit  time  for  the  message  (=TR-TS) 

NT  - expected  transit  time  for  the  message  (since  OT  varies 

drastically  from  packet  to  packet,  NT  is  a smoothed  version 
of  OT) 

var  NT  - variation  in  network  transit  time  (=OT-NT) 

PC  - parcel  count  of  the  message 
D - delay  before  playback  of  first  packet 

D is  a fixed  quantity  for  the  interval;  all  other  factors  can 
vary.  Since  it  is  not  known  at  the  time  the  first  message  is 
received  whether  its  parcel  count  is  large  or  small,  the  time  the 
first  message  should  be  played  back  is  calculated,  not  in  terms  of 
the  time  sent  or  the  time  received,  but  in  terms  of  the  time  the 
first  parcel  was  generated  as: 

TP  = TG  + D + NT  (2) 

D must  accommodate  variation  in  parcel  count  and  variation  in  network 
transit  time. 

The  time  later  messages  are  due  to  be  played  back  is  determined 
by  the  requirement  of  continuity  or  smoothness: 

TP(1+1)  = TP(i)  + PC(1) 

The  time  the  i+lst  message  is  due  to  be  played  out  is  the  time  the 
i^^  message  is  played  out  plus  the  parcel  count  of  the  i^  message. 

If  a message  has  not  been  received  when  it  is  due  to  be  played,  its 
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playback  is  delayed  until  it  arrives  ("sliding  time").  However, 
if  a later  message  is  due  to  be  played,  and  has  arrived  before  the 
packet  in  question  is  received,  playback  continues  with  the  later 
message.  The  earlier  message  is  considered  to  be  lost.  If  it 
subsequently  arrives,  it  is  discarded. 

The  time  at  which  a message  is  received  is: 

TR  = TG  + PC  + OT  (3) 


Now  TP-TR,  for  messages  in  an  interval,  gives  a good  indication 
of  whether  the  choice  of  the  delay  factor  was  optimal  for  quality 
reproduction  of  the  interval.  TP-TR  can  never  be  negative,  since 
a message  cannot  be  played  back  before  it  was  received.  But,  if  it 
was  often  0 for  messages  in  the  interval,  that  is  an  indication  that 
time  was  forced  to  "slide"  and  smoothness  was  poor.  Similarly,  if 
TP-TR  stayed  large  for  the  interval,  D could  have  been  smaller  and 
the  continuity  of  playback  would  not  have  suffered.  To  see  this,  we 
calculate  TP-TR  from  (2)  and  (3) 

TP-TR  = D-(OT-NT)-PC  = D-(varNT+PC) 
min  (TP-TR)  = D-max(varNT+PC) 

where  the  min  and  maximum  are  taken  over  all  messages  in  the  interval. 

If  there  are  a sufficient  number  of  messages  in  the  interval 
(number  of  messages  greater  than  M)  , maximum  var  NT  and  maximum  PC 
are  approximately  constant  from  interval  to  interval.  Then  a 
decrease  in  D (for  the  next  interval)  will  result  in  an  approximately 
equal  decrease  in  min  (TP-TR)  for  that  interval  while  an  increase  in 
D would  result  in  an  Increase  in  TP-TR  for  the  next  interval.  Of 
course,  such  a relationship  is  overridden  for  large  decreases  by  the 
fact  that  TP-TR>0. 


In  automatic  delay  adjustment,  TP-TR  is  calculated  for  each 
message  received  during  a time  Interval.  The  minimum  of  those  values, 
m,  over  the  time  interval  is  used  to  adjust  the  factor  D used  for  the 
next  interval. 

Let  e be  the  desired  minimum  delay  over  an  interval  and  m the 
minimum  TP-TR  for  that  Interval.  D is  calculated  as  follows: 


new  D 


(old  D)t-2e  , if  m=0 
(old  D)f(G-m),  if  0<m<3e 
(old  D)-2g  , if  3e<m 
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These  equations  mean  that  if  'new  D'  had  been  used  during  the 
last  interval  (rather  than  'old  D')  and  O^miSe,  then  the  minimum 
TP-TR  would  have  been  exactly  e.  If  m was  greater  than  3e  in  the 
last  interval,  minimum  TP-TR  would  have  been  decreased  by  2e  had 
'new  D'  been  used.  The  last  equation  limits  the  amount  D can  change 
in  any  one  adjustment. 

The  calculation  ("adjustment")  of  D is  performed  whenever  a 
message  arrives  after  a period  of  silence  and  a preset  number  of 
messages  has  been  generated  since  the  last  calculation.  If  an 
insufficient  number  of  packets  has  been  received,  statistics  on 
TP— TR  continue  to  accumulate,  and  the  old  D is  used  to  determine 
TP  for  the  first  packet  of  the  burst. 

D should  "home  in"  on  the  minimum  delay  which  preserves  continuity 
of  playback. 
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For  convenience,  the  abbreviations  used  here 
TG  - time  first  parcel  generated 
TS  - time  sent 
TR  - time  received 
TD  - time  due  to  be  played  back 
TP  - time  played  back 
NT  - expected  network  transit  time 
OT  - observed  network  transit  time 
PC  - parcel  count 


Shown  below  are  graphs  of  TP-TR  for  parts  of  two  conferences. 
Figure  3 illustrates  the  situation  for  a conference  which  did  not 
have  automatic  delay  adjustment.  D was  set  at  60  for  the  entire 
conference.  When  one  participant  spoke,  D was  insufficient  and 
time  was  frequently  forced  to  "slide".  When  the  other  participant 
spoke,  D was  overgenerous  and  could  have  been  reduced  without 
affecting  continuity. 

The  conference  depicted  in  Figure  4 included  provision  for 
automatic  delay  adjustment.  D was  initialized  at  60  and  e set  to 
5.  While  the  first  participant  spoke,  there  was  no  silence,  and 
D was  not  adjusted.  During  the  interval  the  second  participant 
spoke,  D was  adjusted  until  minimum  TP-TR  approached  5. 


TPO-TR 


measurec 


frames 


messages 


ISI  talking 


CHI  talking 


Figure  3 

TP-TR  for  750  messages.  No  automatic  delay. 


D=60  throughout  conference.  The  0,5  and  10  level  lines  are  shown. 
Intervals  when  ISI  was  talking  are  underlined;  CHI  talking  in  remaining 
intervals.  When  ISI  was  talking,  TP-TR  was  frequently  0,  indicating 
time  slide.  During  last  interval  of  CHI  speech,  D could  have  been 
smaller  and  continuity  would  have  been  preserved. 


TP-TR 


ISI  talking 


CHI  talking 


Figure  A 

TP-TR  for  200  messages  with  automatic  delay  adjustment 


0,5  and  10  level  lines  indicated.  Intervals  when  ISI  was 
talking  underlined.  CHI  talking  during  middle  interval.  Downward 
spikes  indicate  silence  bit  on  for  corresponding  message  (delay 
adjusted  if  preceding  interval  contained  at  least  20  messages). 

Min  TP-TR  approaches  5 (e=5)  after  two  adjustments.  Smooth  playback 
throughout  conference. 
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